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Abstract 

Agents interacting with an incompletely known world need to be able to reason 
about the effects of their actions, and to gain further information about that world 
they need to use sensors of some sort. Unfortunately, both the effects of actions and 
the information returned from sensors are subject to error. To cope with such un- 
certainties, the agent can maintain probabilistic beliefs about the state of the world. 
With probabilistic beliefs the agent will be able to quantify the likelihood of the various 
outcomes of its actions and is better able to utilize the information gathered from its 
error-prone actions and sensors. In this paper, we present a model in which we can 
reason about an agent's probabilistic degrees of belief and the manner in which these 
beliefs change as various actions are executed. We build on a general logical theory of 
action developed by Reiter and others, formalized in the situation calculus. We pro- 
pose a simple axiomatization that captures an agent's state of belief and the manner 
in which these beliefs change when actions are executed. Our model displays a number 
of intuitively reasonable properties. 
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1 Introduction 



An intelligent agent interacting with a dynamic and incompletely known world faces two 
special sorts of reasoning problems. First, because the world is dynamic, it will need to 
reason about change: how its actions and the actions of others affect the state of the world. 
For example, an agent will need to reason that if a fragile object is dropped then it will 
probably break, and regardless of what else happens, the object will remain broken until it 
is repaired. Second, because the world is incompletely known, the agent will need to make 
do with partial descriptions of the state of the world. As a result, the agent will often need 
to augment what it knows by performing perceptual actions, using sensors of some sort. 
Unfortunately, both the effectors that the agent uses to modify the world and the sensors it 
uses to sense the world will in practice be subject to uncertainty, i.e., they will be noisy. For 
example, a robotic agent may not know initially how far away it is from the nearest wall, but 
may have a sensor that it can use to obtain information about this distance. Because the 
sensor is noisy, a reading of, say, "3.1 meters" does not guarantee that the agent is actually 
3.1 meters from the wall, although it should serve to increase the agent's degree of belief in 
that fact. It may need to read this sensor (or additional sensors) a number of times to get 
a sufficiently reliable measurement. Similarly, if it attempts to move 0.7 meters towards the 
wall, it may well end up moving 0.8 meters due to inaccuracies in its effectors. Nevertheless, 
its degree of belief that it is closer to the wall should increase. In this paper, we propose a 
representational formalism to capture the reasoning required to keep the agent's beliefs about 
the world synchronized with the effects of the various actions it performs. Without such a 
synchronization the agent would be unable to modify its environment in any purposeful 
manner. 

Somewhat surprisingly, although the importance of dealing with dynamic and incom- 
pletely known worlds has long been argued within AI, very few adequate representation 
formalisms have emerged. We can classify existing ones into two broad camps. On the one 
hand, we have probabilistic formalisms such as Bayesian nets [|2[] for dealing with uncertainty 
in general, and the uncertainty that would arise from noisy sensors in particular. However, 
with the exceptions noted below, these probabilistic formalisms have not attempted to in- 
corporate a general model of action, i.e., representing what does and does not change as the 
result of performing an action. In addition, while it is possible to express in these formalisms 
probabilistic dependencies among variables, which are in essence atomic propositions, it is 
not easy to deal with many other forms of incomplete knowledge. For example, it is difficult 
to say that one of two conditions holds, or that all objects of a certain type have a certain 
property when it is not known what those objects are. Logical formalisms, on the other hand, 
with features like disjunction and quantification, are well suited for expressing incomplete 
knowledge of this type. Moreover, logical formalisms like dynamic logics, process logics, or 
the situation calculus, allow us to reason about the prerequisites and effects of actions. 

These observations suggest that it would be useful to have a formalism that allows us to 
combine actions, knowledge, (probabilistic) beliefs, and first-order reasoning. There has been 
work that puts some of these components together. For example, the frameworks of Bacchus 
U and Halpern || combines first-order reasoning and probability, but this work does not 
explicitly address reasoning about actions. The framework of Halpern and Tuttle [O has 
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probability, knowledge, time, and actions, but it is not a first-order formalism. Finally, the 
action network formalism of Darwiche and Goldszmidt || extends Bayesian nets to allow 
probabilistic reasoning about action and observation sequences and their effects, but it also 
is not a first-order formalism, nor can it easily deal with other forms of incomplete knowledge 
like disjunction. 

In this paper, we propose a framework that combines first-order reasoning, with reasoning 
about knowledge, probability, and actions. For reasoning about actions we start with a 



variant of the situation calculus |20| that incorporates the solution to the frame problem 
proposed by Reiter [25]. Building on this and on the work of Scherl and Levesque |2J|, who 
incorporated knowledge and perceptual actions into the situation calculus, we show how 
probabilities can be added and how the effects of actions on an agent's probabilistic beliefs 
can be modeled. 

We base our work on the situation calculus as it has proved to be a very convenient formal- 
ism for modeling actions, their prerequisites, and their effects, and for modeling incomplete 
knowledge of the state of the world. Of course, our approach inherits these features. In addi- 
tion, it allows us to build on previous work done in this framework and to take advantage of 
parallel developments. For example, although Reiter's solution to the frame problem is lim- 
ited in a number of ways, it has been extended to handle aspects of the ramification problem 



17fl , agent ability [fL3fl , continuous time [p5j] , and perceptual actions pSfl . Another exten- 



sion of the theory to deal with complex actions (sequence, iterations, recursive procedures, 
non-determinism, etc.), briefly described in Section^, has led to a novel logic programming 
language called GOLOG. Golog has proved to be useful for describing high-level robot and 



softbot control [15]. An implementation of GOLOG exists and a number of programs (in- 



cluding a banking agent softbot application |14|) currently run in simulation mode. A mail 
delivery application written in GOLOG also runs on two different robot platforms []3T|. We 
expect that the extensions we make here can be incorporated into this framework as well. 

Independently of the situation calculus, however, our formalism demonstrates an interest- 
ing interaction between actions and differing mechanisms for updating probabilistic beliefs. 
And as we hope to make clear from our presentation, much of what we do here could be 
carried out in other logical frameworks. 

The rest of the paper is organized as follows. In the next section, we briefly review the 
theory of action in terms of which our account is formulated: the situation calculus and 
the solution to the frame problem proposed by Reiter. In Section |3|, we examine how non- 
deterministic actions can be handled within this framework. In Section f|, we augment the 
framework by adding the notion of the agent's epistemic state. In Section [5] we show how 
probabilities can be added, and present a simple formalization within the situation calculus 
of the degree of belief an agent has in propositions expressed as logical formulas. This allows 
us to formalize in more quantitative terms the changes in belief that arise when dealing with 
noisy sensors and effectors. Examples of the formalism at work are presented in Section [7], 
and some conclusions are drawn in Section Bl. 
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2 A Theory of Action 



Our account of sensors is formulated as a logical theory in an extended version of the situation 
calculus. The situation calculus is a many-sorted dialect of the predicate calculus (with 
some second-order features), containing sorts for (among other things) situations, which are 
like the possible worlds of modal logic, for primitive deterministic actions, and, since we will 
be dealing with probabilities, for real numbers. 

The situation calculus is specifically designed for representing dynamically changing 
worlds. All changes to the world are the result of named actions. A possible world his- 
tory, which is simply a sequence of actions, is represented by a first-order term called a 
situation. The constant So is used to denote the initial situation, namely that situation in 
which no actions have yet occurred. There is a distinguished binary function symbol do; 
do(a, s) denotes the successor situation to s resulting from performing the action a. Actions 
may be parameterized. For example, put(x, y) might stand for the action of putting object 
x on object y, in which case do(put(A, B), s) denotes that situation resulting from placing 
A on B when in situation s. Notice that in the situation calculus, actions are denoted 
by first order terms, and situations (world histories) are also first order terms. For exam- 
ple, do(putdown(A), do(wa\k(L), G?o(pickup(v4), So))) is a situation denoting the world history 
consisting of the sequence of actions [pickup(A), walk(L), putdown(A)]. 

Relations whose truth values vary from situation to situation, called relational fluents , are 
denoted by predicate symbols taking a situation term as their last argument. For example, 
Is Carrying (robot, p, s), meaning that the robot is carrying package p in situation s, is a 
relational fluent. Functions whose denotations vary from situation to situation are called 
functional fluents. They are denoted by function symbols with an extra argument taking a 
situation term, as in position(robot, s) , i.e., the robot's location in situation s. Finally, we 
use the distinguished predicate Poss(a, s) to state that action a is possible to execute in 
situation s. 



2.1 Actions 

Actions have preconditions — necessary and sufficient conditions that characterize when the 
action is physically possible. For example, in a blocks world, we might have:[] 

Poss(pickup(a;), s) 

= -<3z. Holding(z, s) A NextTo(x, s) A -^Heavy(x), 

which says that an object x can be picked up if and only if the agent is not currently holding 
something, it is next to x, and x is not heavy. 

Definition 1 : [Precondition Axioms] Precondition axioms are axioms of the form 

Poss(a, s) = (f)(a, s). 

Here is a formula that specifies the necessary and sufficient properties of the situation s 
and the action a, under which a can be executed in s. | 

formulas, free variables are considered to be universally quantified. This convention will be followed 
throughout the paper. 
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The manner in which the actions modify the world are specified by their effect axioms. 
These describe the effects of a given action on the fluents. For example, a robot dropping a 
fragile object causes it to be broken: 

Poss(drop(r, x),s) A Fragile(x, s) D Broken(x, do(drop(r, x), s)). (1) 

Exploding a bomb next to an object causes it to be broken: 

Poss(explode(5), s) A NextTo(b, x, s) D Broken(x, c?o(explode(6), s)). (2) 

A robot repairing an object causes it to be not broken: 

Poss(repair(r, x),s) D ~^Broken(x, c?o(repair(r, x), s)). (3) 

2.2 The Frame Problem 

As first observed by McCarthy and Hayes [fZCP , axiomatizing a dynamic world requires more 
than just action precondition and effect axioms. So-called frame axioms are also necessary. 
These specify the action invariants of the domain, namely, those fluents that remain unaf- 
fected by a given action. For example, an object's colour is not affected by a robot dropping 
something: 

Poss(drop(r, x), s) A Colour(y, s) = c D Colour(y, do(drop(r,x),s)) = c. 

A frame axiom describing how the fluent Broken is unaffected: 

Poss(drop(r, x),s) A ~^Broken(y, s) A y ^ x V ~^Fragile(y, s) 
D ~^Broken(y, do(drop(r,x),s)). 

The problem introduced by the need for such frame axioms is that we can expect a 
great many of them. Relatively few actions will affect the truth value of a given fluent; all 
other actions leave the fluent unchanged. For example, an object's colour is not changed 
by picking things up, opening a door, going for a walk, electing a new prime minister of 
Canada, etc. This is problematic for the axiomatizer who must think of all these axioms; it 
is also problematic for any automated reasoning process as it must reason efficiently in the 
presence of so many frame axioms. 

Suppose that the person responsible for axiomatizing an application domain has specified 
all of the causal laws for the world being axiomatized. More precisely, suppose they have 
succeeded in writing down all the effect axioms, i.e., for each fluent F and each action A 
that can cause F's truth value to change, they have written axioms of the form 

Poss(A, s) A R(x, s) D {->)F(x, do(A, s)), 

where R is a first-order formula specifying the contextual conditions under which the action 
A will have its specified effect on F. 

A reasonable solution to the frame problem would be a systematic procedure for gen- 
erating, from these effect axioms, all the frame axioms. If possible, we would also want a 
parsimonious representation for these frame axioms (because in their simplest form, there 
are too many of them). 
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2.3 A Simple Solution to the Frame Problem 

By appealing to earlier ideas of Haas 0, Schubert [pSfl , and Pednault [p4fl , Reiter |26| pro- 



posed a simple solution to the frame problem, which we employ in this work. It is best 
illustrated with an example. Suppose that Equations fj, 0, and [| are all the effect axioms 
for the fluent Broken, i.e., they describe all the ways that any action can change the truth 
value of Broken. We can rewrite (0) and (fj) in the logically equivalent form: 

Poss(a, s) A 3r. a = drop(r, x) A Fragile(x, s) 
V 3b. a = explode(6) A NextTo(b, x, s) 
D Brokenix, do(a,s)). 

Similarly, consider the negative effect axiom for Broken, (|3]); this can be rewritten as: 

Poss(a, s) A Br. a = repair(r, x) D ~^Broken[x, do{a, s)). 

In general, we can assume that the effect axioms for a fluent F have been written in the 
forms: 

Poss(a, s) A 7^(f, a, s) D F(x, do(a, s)), (4) 

Poss(a, s) A 7^(x, a, s) D ->F(x, do(a, s)). (5) 

Here jp(x,a,s) is a formula describing the conditions under which doing the action a in 
situation s causes the fluent F to become true in the successor situation do(a,s); similarly 
7f(x, a, s) describes the conditions under which performing a in s causes F to become false 
in the successor situation. Reiter's solution to the frame problem rests on a completeness 
assumption, which is that the causal axioms (f|) and © characterize all the conditions under 
which action a can lead to a fluent F(x) becoming true (respectively, false) in the successor 
situation. In other words, axioms (f|) and (|5|) describe all the causal laws affecting the truth 
values of the fluent F. Therefore, if action a is possible and F(x)'s truth value changes from 
false to true as a result of doing a, then jp(x, a, s) must be true and similarly for a change 
from true to false. Reiter shows how to automatically derive a success or- state axiom 
from the causal axioms (|j) and (||) and the completeness assumption. 

Definition 2: [Successor-State Axioms] Successor-state axioms are axioms of the following 
form 

Poss(a, s) D F(x, do(a, s)) 

= 7^(x, a, s) V (F(x, s) A ->Jf(x, a,s)). 

for relational fluents F, and of the form 

Poss(a, s) D f(x, do(a, s)) = z 
= <y f (z,x,a, s), 

for functional fluents /. These axioms characterize the state of a fluent F (or the value of a 
functional fluent /) in the successor situation given properties of the current state and the 
action executed. | 
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The successor-state axiom for F is a solution to the frame problem for that fluent. Notice 
that this axiom universally quantifies over actions a. In fact, the ability to universally 
quantify over actions is one of the keys to obtaining a parsimonious solution to the frame 
problem. 

Applying this to our example about breaking things, we obtain the following successor- 
state axiom: 

Poss(a, s) D Broken(x, do(a, s)) 

= 3r. a = drop(r, x) A Fragile(x, s) 

V 36. a = explode(6) A NextTo(b, x, s) 

V Broken(x, s) A Sr. a = repair(r, x). 

It is important to note that the above solution to the frame problem presupposes that 
there are no state constraints, as for example the blocks world constraint: Vs. On(x,y,s) D 
->On(y, x, s). Such constraints can implicitly contain effect axioms (so-called indirect effects), 
in which case the above completeness assumption will not be true. The assumption that there 
are no state constraints in the axiomatization of the domain will be made throughout this 
paper.0 



In general, for any application, we use what is called in [|T7[] a basic action theory consisting 
of the following axioms: 

• axioms describing the initial situation — what is true initially, before any actions have 
occurred. This is any finite set of sentences that mention only the situation term Sq. 

• action-precondition axioms (Defn. [I]), one for each primitive action 

• successor-state axioms (Defn. 0), one for each fluent 

• unique-name axioms for the primitive actions (saying that primitive actions with dis- 
tinct names are different) 



a set of domain- independent foundational axioms for the situation calculus jI7||. These 



are used to state that the situations are all and only those reachable from So by 
performing a finite sequence of actions. 



3 Nondeterministic Actions 

In the description of the situation calculus in the previous section, actions are taken to be 
deterministic in the sense that it is assumed that there exists a unique successor state for any 
action executed. To model noisy sensors and effectors, we first show how nondeterministic 
actions can be handled within this framework without giving up the simple solution to the 
frame problem outlined above. 

Nondeterministic actions are treated as being the actual execution of one of a range 
of primitive (deterministic) actions. Nondeterminism arises from the fact that we do not 
know exactly which primitive action was actually executed. Thus, when a nondeterministic 

2 In ]l7j , the approach discussed in this section is extended to deal with some forms of state constraints, 
by compiling their effects into the successor-state axioms. 
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action is executed by the agent, the current situation will have a set of successor states, 
one successor state for each possible primitive action. Nevertheless, we retain the property 
that each of the underlying primitive actions yields a unique successor state. Hence, the 
solution to the frame problem above can be applied to each of these states individually, and 
any properties than can be shown to hold of every successor state will then be provable 
consequences of the nondeterministic action. 

The idea can be illustrated by imagining a simple robot that can move along an un- 
bounded one-dimensional surface. In the deterministic case, we might have a functional 
fluent position(s) and an action exact-advance(x) that changes the position of the robot by 
some amount x. We could then write the following successor-state axiom for position: 



assuming that exact-advance is the only action affecting position^ 

The action exact-advance is deterministic — executing it yields a unique successor state. 
Suppose instead we want to model a nondeterministic action noisy-advance(x), which results 
in the agent moving a distance y which is approximately equal to x. Since we only allow 
deterministic primitive actions in our framework, we view noisy-advance(x) as the union of 
all the possible moves that the agent could actually make, given that it tries to move x. To 
capture this, we assume that the language contains primitive actions of the form advance(x, y), 
where x is the nominal distance the agent is trying to move, by sending an appropriate 
command to its on-board effectors, while y is the actual distance moved. The agent has 
control over the nominal distance, but no direct control over the actual distance moved. So, 
in fact, the agent cannot execute an instance of advance(x, y) directly. 

Nevertheless, we can write a successor-state axiom for position assuming that it can be 
changed only by the action advance. The axiom is the same as above except that the position 
is changed by the actual distance moved and the nominal distance is ignored. 



Although the agent cannot execute an instance of advance directly, it can activate its 
effectors specifying a particular nominal value for the distance to be moved. This corresponds 

3 We are taking some liberties here with notation and the scope of variables; the if-then-else formula 
should be viewed as an abbreviation for the formula 



Poss(a,s) D position(do(a, s)) 
if 3x. a = exact-advance(x) 
then position(s) + x 



(6) 



else position(s), 



Poss(a, s) D position(do(a, s)) 
if 3x, y.a = advance(x, y) 
then position(s) + y 



(7) 



else position(s). 



Poss(a, s) D position(do(a, s)) = z 

= 3x. a = exact-advance(x) A z = position(s) + x 
V ->3x. a = exact-advance(ir) A z — position(s). 



7 



to executing a nondeterministic action: execute advance(x, y) for a fixed value of x and a 
nondeterministic choice of y. 

To specify the execution of such actions we need to expand our notation for do to allow 
for more than one potential successor state.Q The first step is to allow more complex actions. 
Starting with primitive actions (the ones which already have names in the language), we 
form more complex actions by closing off under sequencing and nondeterministic choice). In 
particular, if 8\ and 82 are actions, then so are 81,82 (intuitively, perform 8\ and then 82), 
8\\82 (intuitively, nondeterministically choose one of 81 and 82 and perform it), and nx.Si 
(intuitively, perform 8\ for some nondeterministically chosen value of x; this is particularly 
interesting if x is a free variable in 8). Given a (complex) action 8, we follow the approach 



taken in GOLOG [15| and define Do(8, s, s + ) as an abbreviation for a situation calculus formula 
which intuitively reads "s + is a possible final situation arising from the execution of action 
8 in situation s" . Do is defined by the following recursive expansions: 

1. If 8 is a primitive action like exact-advance(x) or advance(x, y), then the expansion of 
Do(8, s, s + ) is: 

Do(a,s,s + ) d = Poss(a, s) A s + = do(a, s). 

2. If 8 is of the form 81 ; 82 , then the expansion is 

Do([5i; 8 2 },s,s + ) = Els'. Do(8\, s, s') A Do(82, s, s + ). 



3. If 8 is of the form <5i | <52 , then the expansion is 

00(8^82, s,s + ) = Do(8 u s,s + )V Do(82,s,s- 



4. If 8 is of the form ttx.8', then the expansion is 

Do([ttx.8'}, s, s + ) = f 3x. Do(8' , s, s + ). 
This expansion allows for a different successor state for each value of x. 

Now we can formalize a nondeterministic move action noisy-advance(a;) that is directly 
executable by the agent: 

noisy-advance(x) == iry. advance(a;, y). (8) 

In this form, noisy-advance does not give the agent any control over its movements since 
we have not specified any relationship between the nominal value x asked for and the actual 
value y achieved. One reasonable constraint between these two values might be to assert that 
there is an absolute bound on the effector's error. That is, there is a bound on the difference 
between the value asked for and the value achieved. This can be captured by asserting the 
following precondition axiom for advance: 

Poss(advance(a;, y),s) = \x — y\ < b, (9) 

4 Remember that do{. . .) is defined to be a first-order term. Hence, it can denote only a single individual, 
in this case a single situation. 
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where b is the magnitude of the maximum possible error. This axiom says that it is impossible 
to execute advance actions in which the actual value moved is more that b units different 
from the value asked for. When this precondition is combined with the definition of the 
nondeterministic action noisy-advance, it limits the choice of advance actions that can arise. 
Combining this precondition axiom, the definition of noisy-advance, and the expansions for 
Do, it is not difficult to see that the action noisy-advance(a;) can never yield a successor state 
in which the agent has moved an amount outside the range [x — b, x + b] . More formally we 
have clS db direct entailment that 

Z)o(noisy-advance(x), s, s + ) D \position(s + ) — position(s) — x\ < b. 

In Section |5|, we demonstrate how a probabilistic relationship can be specified between the 
nominal and actual values so as to capture the fact that these two values are correlated — 
although the agent's actions do not always have exactly their intended effect they do have 
an effect that is correlated with what was intended. 

Finally, it should be noted that we have added nothing to the previous situation calculus 
story except a few convenient abbreviations. Since the execution of any complex action 
ultimately reduces to the execution of some collection of deterministic actions, we can apply 
the previous solution to the frame problem to each of the possible final situations. 

4 Adding Knowledge 

When the agent executes a nondeterministic action like the noisy-advance action described 
above, the particular deterministic action that ends up being executed remains unknown to 
the agent. Hence, it is clear that in the presence of such actions (and in the presence of 
perceptual actions) there needs to be a distinction between the state of the world and the 
information the agent has about the state of the world. We would like to be able to talk 
about the effect of actions like advance on what an agent or robot knows or believes about 
where it is located. For example, doing a noisy-advance decreases the robot's certainty about 
its position, but doing a sensing action increases it (in a manner to be described below). To 
accomplish this, we need to be able to characterize and reason about the agent's knowledge 
and beliefs about the world, i.e., the agent's epistemic state, and to distinguish this from the 
actual state of the world. 

To talk about the actual state of the world in the situation calculus is easy: the current 
situation captures the current state of the world. To capture the agent's epistemic state 
we employ some standard ideas from modal logics of knowledge and belief. (See |J for 
an introduction, and section [7.2| for a discussion of how the approach taken here relates to 
the model of knowledge presented there.) In modal logic, an agent has a binary possibility 
relation or accessibility relation on possible worlds. The agent is then said to know a fact 
in world (or situation) s if is true at all the worlds the agent considers possible, as 
captured by the possibility relation. It is well known that various properties of knowledge 
can be captured by placing constraints on the possibility relation. In particular, if we take 
the possibility relation to be transitive and Euclidean, the agent has positive and negative 
introspection, so it knows what it knows and knows what it does not know. 



9 



These ideas were first applied in the context of the situation calculus by Moore |[21|| . 
who introduced a binary fluent K into the language to capture the possibility relation of 
modal logic. That is, K(s', s) holds if, in situation s, the agent considers situation s' to be 
possible. As in modal logic, we say that the agent knows in situation s if <f)(s') holds in all 
situations s' that the agent considers possible in s (i.e., such that K(s',s) holds). Moore's 
approach was extended and integrated with Reiter's solution to the frame problem by Scherl 
and Levesque |ES|. The key issue is defining an appropriate successor-state axiom for K, 



that specifies how the fluent changes after doing an action a in some situation s. In other 
words, we need to specify what has to be true of s' + for K(s' + , do(a, s)) to hold. 

Scherl and Levesque do this by dividing actions into ordinary ones (like moving) and 
knowledge-producing ones intended to change only the K fluent. However, their characteri- 
zation entails knowledge of the action executed; i.e., the agent always comes to know what 
action (of either type) was executed. We need to model the case where nondeterministic 
actions are being executed by the agent, and the agent does not come to know the exact 
action being executed. For example, if the action happens to be advance(a;, y), we do not 
expect the agent to know that this action was performed, since that would mean knowing 
the actual distance y moved. Rather, we expect the agent to know only the nominal value 
x that it tried to move and whatever constraints this places on y. That IS, clS 8b result of 
performing this action, the agent learns just that it tried to move x. 

Another way of thinking about this is that, after doing advance(x, y), the agent knows 
only that advance(x, y') was performed for some compatible value y'. To formalize this idea, 
we use a special predicate Ol(a, a', s), meaning a and a' are observationally indistinguishable, 
i.e., cannot be distinguished by the agent. We assume that as part of the background theory, 
the user specifies a collection of observation-indistinguishability axioms characterizing this 
predicate. 

Definition 3 : [Observation-Indistinguishability Axioms] Observation-indistinguishability 
axioms are axioms of the form 

Ol(a, a', s) = <p(a, a', s), 

one for each action a. Here <f) is a formula that characterizes the relationship between actions 
a and a' that makes them indistinguishable in situation s. These axioms specify the set of 
actions a' that the agent has no ability to discriminate from action a in situation s. That is, 
in situation s the agent cannot tell if action a or a' was executed. | 

For ordinary actions, i.e., primitive actions for which the agent knows when they are per- 
formed and knows all of their consequences, this axiom would say that Ol holds iff a = a'. 
For example, if pickup(a;) is an ordinary action, we would have 

Ol(pickup(a;), a', s) = a' = pickup(a;). 

For actions like advance(x, y), however, the axiom would be 

Ol(advance(a;, y), a', s) = 3y' . a' = advance(a;, y'). (10) 
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This says that, for example, the agent cannot distinguish between the executions of the 
actions advance(3.0, 3.2) and advance(3.0, 2.8), since both involve attempting to move the 
same 3.0 units. 

Using Ol, we can specify a new successor-state axiom for K that correctly handles actions 
like advance(x, y). 

Definition 4: [K's Successor-State Axiom] 

Poss(a, s) D K(s' + , do(a, s)) 

= 3a', s'. s' + = do(a', s') A Poss(a', s') A Ol(a, a', s) A K(s', s) 



This axiom says that when the agent is in situation do(a, s) (after having executed action a in 
situation s), the set of .ff -related situations (that define its state of knowledge) are precisely 
those situations that are the result of executing an observationally indistinguishable action 
a' from some situation s' that satisfies the preconditions of a' and was .ff -related to the 
previous situation s. Intuitively, when the agent is in s, as far as it knows it could be in 
any situation s' that is i^-related to s. Hence, as far as it knows it could have executed 
a in any of these situations, and all of the a successors of these situations must be in its 
new knowledge set. Moreover, it does not even know if it in fact executed a; it could have 
executed any a' indistinguishable from a, so the a' successors of these situations must also 
be in its new knowledge set. However, it does know that an action indistinguishable from 
a was successfully executed. Hence, it can eliminate from its knowledge set the successors 
of those situations that fail to satisfy the preconditions of some a'. It is not hard to show 
that for ordinary actions like pickup(x), which are observationally indistinguishable only from 
themselves, this axiom reduces to the axiom given by Scherl and Levesque |[28|| ,p| 

As we said earlier, we define knowledge to be truth in all worlds the agent considers 
possible. For example, to say that the agent knows in situation s that object x is not 
broken, we would state W. K(s f , s) D -^Broken(x, s'). It is convenient to introduce special 
syntactic machinery to express this condition: 

Definition 5: [Know] Let s now be a special situation term. We write KNOw(0[s now ], s) 
to indicate that the agent knows 0[s„ ou ,] in situation s. Thus, Know(0[s„ ow ], s) is an ab- 
breviation for the formula W. K(s', s) D 4>[s now / V], where s 1 is some variable not appearing 
anywhere in 0, and 4>[s now /s'} is the result of replacing all freef] occurrences of s now in by 
s'. I 

For example, Know (-iBroken(x, s now ), s) is an abbreviation for the above formula. Notation- 
ally, it is often convenient to suppress the s now term and simply write Know(-i Broken(x), s), 
assuming that it will be clear from context where the implicit situation variable s' needs to 
be inserted. 



5 See Section 7.1 for how this axiom also correctly handles Scherl and Levesque's knowledge producing 
actions. 

6 An occurrence of s now is considered to be bound if it appears within the scope of Know and free 
otherwise. 
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Example 4.1: Given the axioms above, it is easy to show that after doing a noisy-advance(x), 
the agent will know that its current position is its previous position plus x, give or take b 
units. More precisely, let us assume that we have a special function prev(s) which for sit- 
uations other than Sq denotes the situation immediately before s.f\ Then these formulas 
entail 

£>o(noisy-advance(a;), s, s + ) 

D K~NOW(position(s now ) G [position(prev(s now )) + x — b, position(prev(s now )) + x + b], s + ) 

or, in more detail, 

Z}o(noisy-advance(x), s, s + ) 

D Vs /+ . K(s' + , s+) D 3a', s'. s' + = do(a', s') 

A (positions') + x — b) < position(s' + ) < (position(s') + x + b). 

Briefly, any situation s + that is the result of executing noisy-advance(x) must have been 
the result of executing advance(s, y) for some y. According to (0), advance(x, y) is executable 
only if | a; — y\ < b. Hence, from (|7]) we know that in s + , the agent's position is within b 
of its position at s plus x. By Defn. [| and (|T0|), this holds in all of the situations that are 
.fT-related to s + as well: all of these situations are also the result of executing advance(x, y) 
for some y with \x — y\ < b. Hence, the agent knows this in s + . 

It is not hard to show that the agent would also know that this property holds before 
executing the noisy-advance(s) action, that is, that the formula KNOW(/0, s) is entailed, where 
ip is 

Vs + .-Do(noisy-advance(x), s n0WJ s + ) 

D position(s + ) E [position(s now ) + x — b, position(s now ) + x + b]. 

Observe that a noisy advance decreases positional certainty, in that even if the agent 
knows its current position precisely, after doing a noisy advance it will know its position 
only to within b units. Similarly, if the agent knows its current position to within a units it 
will know its position only to within b + a units after doing a noisy advance. In particular, 
multiple advances add to the agent's uncertainty, so the following formula is also entailed. 

Z?o(noisy-advance(x) ; noisy-advance(y), s, s + ) 

D K~NOW(position(s + ) G [position(prev(prev(s now ))) + x + y — 2b, 
position(prev(prev(s now ))) + x + y + 26], s + ) 

I 

4.1 Sensing Actions 

Sensing actions can also be handled within the account we have developed so far. Sensing 
actions are actions that are executed primarily for the change they produce on the K fluent; 
pure sensing actions would affect no other fluent, and would appear only in the successor- 
state axiom of the K fluent. 

7 We can characterize this function by the axiom prev(do(a, s)) = s. 
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For example, imagine that the robot has a sensor that senses its current location in our 1- 
dimensional world. The sensor is subject to error, so the value it returns is unlikely to be the 
exact location. We can model the act of using such a sensor with an action sense-position (x, y) 
analogous to advance(x, y). In this case, the nominal value x would be the value appearing 
on the sensor when the sensing action is executed, and the actual value y would be the true 
position. As with advance, we assume that the agent cannot observe the actual value. This 
gives the following observation-indistinguishability axiom for sense-position: 

Ol(sense-position(:r, y), a', s) = By', a' = sense-position (x, y'). (11) 

What makes sense-position (x, y) carry information is the fact that the true position y 
places constraints on the value of x that can be read from the sensor. This is captured by 
the precondition axiom for this action, which is 

Poss(sense-position(x, y), s) = y = position's) A \x — y\ < c. (12) 

In this case, we have simply asserted that there is a bound on the error in the value read 
from the sensor: it must be within c units of the true value. 

The main difference between sense-position and advance is that with sense-position the robot 
does not have the ability to select either the nominal value read x or the actual value y. The 
first of these is chosen nondeterministically, and depends on the error in that activation of 
the sensor; the second (by virtue of the above precondition axiom) is determined by the 
actual position of the robot. Thus, we assume that the robot merely gets to execute the 
nondeterministic action noisy-sense-position which takes no arguments: 

noisy-sense-position = f irx, y. sense-position (x, y). (13) 

Example 4.2: Given the axioms above, we can show that after doing a noisy-sense-position, 
the agent will know what its current position is to within c units. More precisely, 

Z)o(noisy-sense-position, s, s + ) 

D 3x. Know (position e [x — c, x + c] , s + ), 

or, in more detail, 

3x3y s + = c?o(sense-position(x, y), s) 

D 3xVs /+ . K(s' + , s + ) D (x — c) < position(s' + ) < (x + c). 

is entailed by these axioms. 

Once again, any situation s + that can arise from executing noisy-sense-position must be 
the result of executing a particular instance of sense-position (s, y). According to (|12D , these 
instances are executable only if y = position(s) and x 6 [position(s) — exposition's) + c). 
Hence, there is a fixed value x (the value that will be read from the sensor) such that 
position's) G [x — c, x + c]. Furthermore, by (|7|) we see that sense-position actions do not 
affect the current position, so position(s + ) = position's) and the bound holds in s + as well. 
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Now if we examine all situations s' iT-related to s + we see that (1) they must be 
the result of executing an observationally indistinguishable action sense-position (x', y') in a 
situation s' that is ^-related to s, which by (|TT| ) means x' = x, and (2) sense-position^', y') 
must be possible in s', which by flEp means y' = positions') and y 1 G [x' — c, x' + c] . Putting 
these together we see that position(s') G [x — c, x + c] . Furthermore, again by (|7|), we obtain 
position{s' + ) G [a; — c, x + c] . Hence, the position in every iT-related state lies in a fixed 
range, so the agent must know this. | 

There are a couple of points worth noting. First, if the agent has very incomplete knowl- 
edge of its position, then its knowledge of its position will increase after reading its sensor. 
This increase of knowledge arises from the pruning away of iT-related situations that have 
divergent values of position. In particular, in situations with extreme values of position there 
will be no observationally indistinguishable action sense-position (x, y') that can be executed 
(the fact that y' must equal the situation's position value will violate the bound between y' 
and the fixed value of x). 

Second, if the agent has strong knowledge of its position, then that knowledge will not 
degrade after reading its sensor. For example, suppose that the agent knows the exact 
value of its current position. This means that position has the same value in all situations 
.fT-related to the current situation. In this case, the agent will continue to know its exact 
position after reading its sensor. This arises from the fact that all situations that are K- 
related to the successor state must arise from the execution of a sense-position action in a 
situation that was .fT-related to the current state. Since, sense-position actions do not change 
the value of position, they will all continue to share the same value of position. 

5 Likelihood and Degree of Belief 

Asserting bounds on the difference between the nominal values and the actual values in 
actions like sense-position (x, y) represents a rather weak model of action uncertainty. Suppose, 
for example, we have a sensor with an error bound of c = 2, and we make a number of readings 
of a particular fluent using the sensor, all of which are clustered around the value 3. For 
concreteness, suppose they are all between 2.8 and 3.1. As far as knowledge goes, all the 
agent will be able to conclude is that it knows the fluent to have a value in the range [1.1,4.8]. 
Getting numerous readings of 3 will not change this knowledge. Yet, even if the agent is 
using a cheap sensor, we might hope that getting many such readings would increase the 
agent's degree of belief that the true value of the fluent is indeed close to 3. 

To formalize these intuitions, we first quantify the notion of possibility (as captured by 
the K fluent) by associating with each world the agent considers possible the agent's degree 
of belief that that is the actual world. We can think of this degree of belief as a subjective 
probability. We then consider how these degrees of belief change over time, as a result of 
actions being performed. 

Degrees of Belief. It is convenient to capture degree of belief by first associating with 
each situation a weight, and then taking the degree of belief to be the normalized weight 
(since, like probability, we want the degree of belief in the whole space to be 1). We capture 
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the weight by using a new functional fluent p(s',s), analogous to K(s',s). This function 
denotes the relative weight given to situation s' by the agent when it is in situation s. 

We expect weights to be non-negative and that all situations considered impossible will 
be given weight 0. The following constraints, which we assume is part of the background 
action theory ensures that this is indeed the case initially: 



As we shall see, the successor-state axiom for p ensures that these constraints hold at all 
times. 

We take the agent's degree of belief in <p to be the total weight of all the worlds he considers 
possible where holds, normalized by the total weight of all worlds he considers possible. 
Thus, we are restricting to discrete probability distributions here, where the probability of a 
set can be computed as the sum of the probabilities of the elements of the set.Q We introduce 
special notation for this, just as we did for Know. 

Definition 6: [Bel] Let s now be a special situation term. We write BEL(0[s now ], s) to denote 
the agent's degree of belief in 0[s no ,„] when it is in situation s. This is an abbreviation for 
the term 



As with Know, we sometimes suppress the s now terms. In Appendix |A], we show how the 
summations in this formula can be expressed using second-order quantification. A logical 
consequence of this formalization is that Bel(-,s) is a probability distribution over the 
situations -fT-related to s. I 

Action-Likelihood Functions. Next we have to define a successor-state axiom for the 
p fluent, that is, we need to consider how the agent's probability distribution changes over 
time. It will not be necessary to characterize how belief changes, since belief is defined in 
terms of p. 

Suppose the agent performs an action a in some situation s. If a is an ordinary de- 
terministic action like pickup(s), then the agent knows it will end up in a unique situation 
do(p\ckup(x) , s) . Intuitively, in this case, the probability that it ascribed to s will be trans- 
ferred to do(p\ckup(x) , s). Similarly, this is true for all the situations that the agent considers 
possible at s. 

But suppose that a is an action like sense-position (x, y): sensing a value of x when the 
true position is y. In this case, the probability the agent ascribes to do(a, s) will not simply 
be what is transferred from s; how likely the action a is must also be taken into account. 
As discussed above, sense-position(x, y) is not an action that an agent can execute directly: 
it executes something like noisy-sense-position, and one of the sense-position actions is selected 
nondeterministically. But some of these are more likely to be selected than others. For 
example, if the true position in situation s is 3.0, we expect that sense-position(3.0, 3.0) is far 

8 This is in contrast to continuous probability distributions, in which the probability of each point is 0, 
so the probability of a set cannot be computed as the sum of the probabilities of the elements in the set. 
Instead we must integrate over the set. 
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more likely to be executed than sense-position(30.0, 3.0). Consequently, the probability the 
agent ascribes to the situation do(sense-position(3.0, 3.0), s) should be greater than what it 
ascribes to do(sense-position (30.0, 3.0), s). 

To specify these probabilities, we use what we call an action-likelihood function. We add 
a special function £(a, s) to our language, used to denote the probability (assigned by the 
agent in situation s) of primitive action a being selected among all its possible observationally 
indistinguishable alternatives. Thus, for example, £(sense-position(3.1, 3.0), s) is the likelihood 
that the agent would sense 3.1 when its true position was 3.0 in situation s. Note that if a is 
an ordinary deterministic action like pickup (x), then its only observationally indistinguishable 
alternative is a itself, so, as long as a is possible in s, we would normally have £(a, s) = 1. 

The error profile of the various effectors and sensors available to the agent is clearly 
application dependent, and it is this error profile that is captured by the £(a, s) function. 
Hence, we assume that the user specifies a collection of axioms characterizing £ for each 
action a. 

Definition 7: [Action-Likelihood Axioms] Action-likelihood axioms are axioms of the form 

£(a, s) = z = (f)(a, s). 

Here 4>(a, s) is a formula that characterizes the conditions under which action a has likelihood 
z in situation s. We will see some example axioms below. | 

With this machinery, we can give the successor-state axiom for p, and hence characterize 
how belief changes as the result of performing an action. Suppose situation s' + is of the form 
do(a', s'), that is, it is the result of doing action a' in situation s'. Then the weight that the 
agent assigns to s' + in do(a, s) is the product of the weight it assigns to being in situation 
s' and the likelihood of a' having been performed: the former is unless K(s', s) holds, in 
which case it is p(s',s); the latter is unless a' is observationally indistinguishable from a 
and possible in situation s', in which case it is £(a', s'). 

Definition 8: [Successor-State Axiom for p] The successor-state axiom for p is 

Poss(a, s) D p(s' + , do(a, s)) = 

if 3a', s'. s' + = do(a', s') A Poss(a', s') A Ol(a, a', s) A K(s', s) 
then p(s', s) x £(a', s') 
else 0. 



Note that the axiom captures the fact that any situation that is not .fT-related to the successor 
state will be assigned probability 0: the precondition to the then clause is precisely the 
condition required for a situation to be K related to the successor state (Defn. f§). It also 
ensures that when Poss(a', s') is false, the weight assigned to the situation do(a', s') will be 
0J 

9 This can also be achieved by eliminating Poss from this axiom and requiring £(a',s') to be when 
Poss(a', s') is false. 
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As a concrete example of how action-likelihood axioms can be used, consider an effector 
action like advance(x, y), where the agent chooses the nominal value x and the actual value 
y is nondeterministic. Suppose the correlation between x and y is described by a linear 
Gaussian model: the actual value y is the nominal value x plus some random noise that has 
a normal distribution. This means that y — x, the difference between the actual movement 
and the nominal movement, is normally distributed, with some mean [i and variance a 2 . If 
we further assume that the effector is calibrated so that there is no systematic bias, then 
/i = 0, and the distribution of y — x is characterized by Normal((y — x) / a) , where Normal(z) 
is the standardized normal distribution with mean and variance 1.Q 

Given these assumptions, the following axiom describes the action: 

£(advance(x, y), s) — Normal((y — x) / a) . (15) 

This axiom captures the linear Gaussian error model, asserting that the probability of 
the effector moving y units when asked to move x units falls off like the normal distribution 
as the difference between these two values increases. We can do something similar for sensor 
actions. If x is the value read and y is the actual value, we might once again assume that 
their difference x — y is normally distributed, say with variance a' 2 and mean 0. (Again, 
assuming a mean of implies that there is no systematic bias in the reading.) This gives us 
the following action-likelihood axiom: 

fl/ .. / \ \ f Normal((y — x)/a') if position(s) = y ,.„ s 

£(sense-position(x,w),s) = < _ uy ;/ ; ./ ... )( , (16) 

v y v ; [ if position(s) ^ y v ' 

which specifies that the sensor's error profile has a normal distribution, but with perhaps 
a different variance a' 2 . In either case, we could have also insisted that £(a, s) be when 
Poss(a, s) is false (for example, when the difference between the nominal and actual value 
exceeds some bound) , but as noted above, this is not necessary because of the way we have 
written the successor-state axiom for p. 

Example 5.1: When specifying the likelihood function we can encode complex context- 
dependent error profiles. Suppose have a fluent slippery that is true of situations where the 
robot is located on a slippery surface. We could account for a context-dependent effect on 
the robot's motion effectors with an axiom like 

£(advance(x, y), s) = 

if slippery (s) 

then Normal((x — y) /2.4) 
else Normal((x — y)/0.5) 

10 There is a slight subtlety here. The normal distribution is, of course, a continuous distribution, so the 
probability of a single real number is always zero. Only non-trivial intervals of the real line have positive 
probability. We are assuming that all probability functions are discrete here, so we really need to consider an 
approximation to the normal distribution. In examples like this, we can do so by treating action parameters 
as finite precision numbers. In this case, we interpret Normal(z) as being the integral of the normal density 
function over the range of precision of z. For example, if the actuator allows the agent to ask for only 
movements like 3.1 with 2 digits of precision, the range denoted by 3.1 could be 3.05-3.15, and Normal(3.l) 
would be the value of the integral of the normal density function over this range. Given some fixed precision, 
a table of values for Normal can be computed and added to the domain theory as a set of equations. From 
here on in, when we write Normal, we actually mean the discrete probability distribution that approximates 
Normal. 
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which indicates that the probability of error increases significantly when the surface is slip- 
pery (the standard deviation increases). To the best of our knowledge, no other formalism 
for representing the uncertainty associated with sensors and effectors allows this sort of 
context-dependent error profile to be expressed. 

Similarly, suppose that the effector's error was cumulative. That is, the further the robot 
tried to move the greater the error. Then an action-likelihood axiom like the following could 
be used: 

£(advance(x, y), s) = Normal((x — y) / (cx)) 

This axiom specifies that the standard deviation is some factor, c, of the nominal distance 
the agent is attempting to move. I 



6 Summary of the Formalization 

This concludes the formal details of our approach. But before looking at some its properties, 
let us summarize the components. To incorporate noisy sensors and effectors as well as 
degrees of belief into the situation calculus, we start with the situation calculus language 
from before, but add two additional fluents, K and p, a new distinguished predicate Ol and 
function £. We need to extend the basic theory of action (see Section [2.3|) to what we will 
now call an extended theory of action, consisting of the following axioms: 



• axioms describing the initial situation, as before, but now including Axiom |14] for K 
and p in S ; 

• action-precondition axioms, as before; 

• successor-state axioms, as before, but now including one for K (Defn. [|) and one for p 
(Defa.D; 

• unique-name axioms for the primitive actions, as before; 

• observation-indistinguishability axioms (Defn. |3[) characterizing Ol, one for each action; 

• action-likelihood axioms (Defn. |7]) characterizing £, one for each action; 

• various abbreviations for convenience, including Know, Bel, and Do; 

• axioms (given in Appendix [A]) to ensure that Bel is always a probability distribution. 

• the foundational domain-independent axioms. These need to be modified to take into 
account that because of its epistemic alternatives, Sq is no longer the only initial 
situation [12]. However, nothing we do here hinges on these details. 

As we shall show, all of the expected properties are then logical consequences of these 
axioms. In taking this approach, we obtain the full expressiveness of the situation calculus 
for reasoning about action, a simple solution to the frame problem, as well as an expressive 
language for dealing with the uncertainty arising from sensors and effectors. 
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7 The Formalism at Work 



We have presented a particular model for noisy actions and their effects on an agent's be- 
liefs. Our model involves using nondeterministic actions with probabilities attached to the 
various nondeterministic choices. This gives a model of actions that is essentially identical to 
standard probabilistic actions, where an action yields various successor states with varying 
probabilities. Although this is a fairly standard model, it is difficult to defend its correctness 
in any formal way. The best that can be done is to demonstrate that the model behaves in a 
way that matches our intuitions and properly replicates other more specialized approaches. 

7.1 Knowledge-producing actions 

In the Scherl and Levesque approach, in addition to ordinary actions like pickup (x), there 
are assumed to be special knowledge-producing action associated with various fluents. For 
example, to open a safe by dialing its combination, an agent needs to know what the com- 
bination is. If the combination of the safe is written on a piece of paper, there might be a 
knowledge-producing action read-combi whose effect is to change the the state of knowledge 
to make the value of the combi fluent known. 

We can model this type of action as follows. Assume we have a primitive action read (z) 
which is the action of reading the number z on the piece of paper. We assume that this 
action only happens if z is the combination of the safe, which we model using a precondition 
axiom. 

Poss(read(,2), s) = z — combi(s). 

In this case, there is no distinction between the number observed on the paper and the actual 
value, and so this action is observationally indistinguishable only from itself: 

Ol(read(z), a, s) = a = read(z). 

Finally, the agent cannot choose which number to see, it can only choose whether or not to 
see what is written on the paper. Consequently, we define read-combi by 

read-combi = f txz. read (z). 

It is then easy to show, that after reading what is on the paper, the agent knows the 
combination to the safe. That is, the following is entailed: 

£>o(read-combi, s, s + ) D 3z.K~NOW(z = combi, s + ). 

This approach clearly generalizes to handling knowledge-producing actions which tell the 
agent the truth value of some fluent. One nice advantage of the approach here is that, unlike 
in Scherl and Levesque, it is not necessary to mention by name the knowledge-producing 
actions in the successor-state axiom for K, which means that it is not necessary to know 
them all in advance. 
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7.2 The FHMV Model of Knowledge 

To further understand our approach to modeling the agent's knowledge and belief in the 
situation calculus, it is useful to compare it with the approach to modeling multi-agent sys- 



tems taken in || || (and expanded in [K| to allow probabilistic actions). In this approach, 
agents are always assumed to be in some local state. Besides the agents, there is an en- 
vironment, which at any given time is also in a particular state. The environment's state 
captures everything that is relevant to the description of the system and not captured by 
the agent's state. Thus, if there is only one agent (as is the case here), the system's global 
state at any time can be described by a pair (s e ,s a ), where s e is the environment's state 
and s a is the agent's state. The system's global state changes as a result of actions being 
performed. These actions may change either (or both) of the components of the global state. 
For example, a sensing action that did not change the environment could be modeled as 
changing only the agent component s a of the global state. 

What we are doing here using the situation calculus can be modeled within this framework 
by mapping situations to global states. That is, we map every situation s to a pair (s e , s a ). 
This mapping can be constructed in a variety of ways. Perhaps the easiest is to take s e , the 
environment's state, to consist of the values of the fluents (other than do, K, and p) in 5*0, 
together with the sequence of primitive actions used to generate the situation. Similarly, 
the agent's state, s a , can consist of an initial state that describes the agent's beliefs in 
So, together with a sequence that records what the agent learns as a result of performing 
the actions described in the corresponding environment state. In our approach, the agent 
learns only that an observationally indistinguishable action was executed; so an appropriate 
sequence for the agent state can be generated by replacing each action a in the corresponding 
environment sequence by a unique canonical representative from the set of actions that are 
observationally indistinguishable to a (with respect to the situation that a was executed 
in). We remark that Lakemeyer independently used this approach to give semantics to 
knowledge and only knowing in the situation calculus [[II] (also see [|12])- This method of 



mapping situations to global states captures the implicit situation calculus assumption that 
the agent remembers all of the actions that have been performed (modulo observationally 
indistinguishability). It is also obvious that from any global state (s e ,s a ) we have sufficient 
information to reconstruct the situation s that gave rise to it. 

With this view of situations, the definition of do for the FHMV model is immediate. If 
(s e , s a ) is the global state corresponding to situation s, then do(a, (s e , s a )) is the new global 
state (Sg, s' a ) where s' e is the result of appending a to s e , and s' a is the result of appending 
a canonical member of {x : Ol(a,x, s)} to s a . Similarly, there is an obvious definition for 
K as well, which is the same as that taken in || [27]: K((s' e ,s' a ), (s e ,s a )) holds precisely if 
s' a = s a , i.e., the agent has the same local state. Notice that this makes K an equivalence 
relation, that is, reflexive, symmetric, and transitive. Finally, we can model p by defining 
p((s' e , s' a ), (s e , s a )) to be equal to p(s', s) where s' is the situation corresponding to the global 
state (s' e ,s' a ). By normalizing summations over p we obtain a distribution that defines the 
agent's probabilistic beliefs when it is in global state (s e , s a ). 

Besides helping to clarify our approach to modeling the agent's epistemic state, the 
discussion above also shows that the situation calculus is not fundamental. We could have 



20 



taken essentially the same approach using a modal logic (the FHMV model is formalized in 
terms of modal logic) instead. 

7.3 Bayesian conditioning 

A standard model for belief update in the light of noisy information is Bayesian condition- 
ing. The standard Bayesian model assumes two pieces of probabilistic information: a prior 
distribution Pr(i) on the value t being sensed, and a conditional distribution Pr(x\t) that 
gives the probability of sensing x given that the true value is t. Furthermore, the standard 
model requires the assumption that the value read from the sensor is dependent only on the 
true value, and is thus independent of other factors given this value. 

Bayes' Rule is now applied to obtain a posterior probability Pr(t\x) over the values of t, 
given that the sensor read the value x: Pv(t\x) = Pr(x\t) Pr(i)/ Pr(x). The denominator — 
the prior probability of reading the value x — is the only unknown expression, but it can be 
easily computed. Since XV Pr(£'|x) = 1, we must take Pr(x) to be the normalizing factor 
XV Pr(x\t') Pr(t'). Since this normalizing factor is independent of t, we see that the key factor 
in determining the posterior probabilities is the numerator Pr(x\t) Pr(t), which describes the 
relative probability of different values of t given the observation x. 

Two significant assumptions are typically made when applying Bayesian conditioning. 
The first is that the world does not change while the observation is being made, and the 
second is that conditioning on a value (like x above) is the same as conditioning on the event 
of observing x. 

In general, there need not be a distinction between sensing actions, that only affect the 
agent's beliefs, and ordinary actions that affect the agent's environment: in our formalism 
it is quite possible to define actions that change the world in the course of making an 
observation. For such actions, conditioning the agent's beliefs only on the value sensed (as is 
done above) does not adequately describe the agent's new belief state. It is necessary to also 
take into account the effect of the other changes caused by the action (this is the so-called 
"total evidence" requirement of Bayesian conditioning). In our formalism the agent's new 
beliefs (as described by the updated p) is in fact affected by both the sensed value and any 
other changes caused by an action. 

To understand the second assumption, imagine that there are two fluents, Light- On and 
Book, where Light-On(s) holds if the light is on in situation s and Book(s) holds if there 
is a book in the room. Initially, say that the agent considers the four possible situations 
characterized by these two fluents as equally likely. Suppose the agent does a sensing action 
that detects whether or not the book is in the room, but only if the light is on; when the light 
is off nothing can be determined. If the sensing action does in fact detect the book, then the 
agent should ascribe probability 1 to the light being on. However, this is not the probability 
obtained by simply conditioning the initial uniform distribution on Book. The probability 
that the light is on given that the book is in the room is 1/2, but the probability that the 
light is on given that the book was observed to be in the room is 1. The naive application of 
Bayes' rule implicitly assumes that conditioning on the book being in the room is equivalent 
to conditioning on observing the book. 

In such examples, we are dealing with a sensing action that is dependent on more than just 
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the value (or truth) of the fluent being sensed. It depends on other features of the situation as 
well. As demonstrated in Example |5.1| , our formalism can model context-dependent sensing 
actions, and the agent's new beliefs will reflect both the values sensed as well as what can 
be learned about the context from that value. In this case, sensing the value true for the 
fluent Book also allows us to learn that Light-On is true. 

Our formalism does not force us to make these two assumptions, but when we do our 
approach updates beliefs in a manner identical to standard Bayesian conditioning. We 
demonstrate this in the next section. 

7.3.1 Noisy sensors 

To see how our formalism works, suppose the agent can perform a noisy sensing action 
noisy-sense-f to sense the functional fluent /. This action is the nondeterministic union of the 
primitive actions sense-f (a?, y), which denotes that x is sensed, but the true value of / is y. 
As we would expect, the precondition for the primitive action is 

Poss(sense-f(x, y),s)=y = f(s); 

the action sense-f(rr, y) is possible in situation s exactly if f(s) = y. We assume that the 
likelihood of these actions is otherwise independent of s, being dependent only on the actual 
value of / (which is equal to y by the above precondition) and the value sensed: 

£ (sense- f (x, y), s) = £(sense-f(x, y)). 

Note that this assumption is satisfied if we assume that the likelihood is a linear Gaussian 
function of x — y, as discussed in Section ||. We take noisy-sense-f(a;) to denote the complex 
action of observing x on the sensor; thus, 

noisy-sense-f (x) == ny. sense-f(x, y). 

We now want to make sure that the assumptions underlying Bayesian updating hold. To 
capture the first assumption, that the world does not change while the observation is being 
made, we simply ensure that sense-f(x, y) does not affect any other fluent besides K and p. 
In particular, using (f|) and (||), we ensure that the action does not appear in the positive or 
negative effect axioms for any other fluent. 

That the second assumption holds is almost immediate: the value x returned by the 
sensing action is influenced only by the true value of /. 

Now we derive a characterization of the agent's beliefs after reading the value x from 
the sensor. Suppose that s + is a possible successor state after reading the sensor, i.e., 
Do( noisy-sense-f (x), s, s + ) holds, so that s + is the result of executing sense-f(a;, y) in situation 
s, for some y and a fixed value of x. From the precondition of sense-f (a;, y), we see that 
we must have y = f(s), so there is actually only one possible successor state; that is, 
s + = do (sense- f(x, f(s)), s). Since sense-f does not affect /, we also have f(s + ) = f(s). 

From p's successor-state axiom (Defn. |8]), we have that p(s' + ,s + ) can be nonzero only 
for situations s >+ that are the result of executing an observationally indistinguishable action 
whose preconditions are satisfied in a state s'. From the axioms for sense-f, we see that all 
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such situations with non-zero weight must be of the form s' + = do(sense-f(x, f(s')), s'), so 
p(s' + , s + ) = p(s', s)£(sense-f(x, f(s')), s') for some s', and f(s' + ) = f(s'). 
Since Bel is an abbreviation for summations over p (Defn. we obtain 

Do(noisy-sense-f(x), s, s + ) D 
BEL(/(s now ) = t,s + ) 

= E{s' + :f(s' + )=t} P(s' + , S + )/ Es'+ P(S >+ , S + ) 

= E{s'-.f(s')=t} P( s '> s)£(sense-f(x, f(s')), s') / £ s , p(s', s)£(sense-f(x, f(s')), s'). 

It is not hard to show that the numerator of this last expression is the (unnormalized) belief 
that the reading is t and x is observed (which is the product of the prior belief that t will be 
read and the conditional probability of observing x given t), while the denominator is the 
(unnormalized) belief that x will be read. Thus, this computation simulates the Bayesian 
computation. This comes out even more clearly if we take advantage of our assumption that 
the likelihood is independent of s. With this assumption, the term £(sense-f(x, f(s')), s') is 
constant over the summation in the numerator. By dividing both numerator and denomi- 
nator by J2 S > p{s' i s), we can convert the sums over p into an expression involving BEL: 

Z}o(noisy-sense-f(x), s, s + ) D 

Rpt ( f( q \ - t ~+\ - BEL(/(g raow ) = t, s)£(sense-f(x, t)) (17) 

at J L{J{S now —Z,S — . , — r — — 77-— 

2^ 4 ,BEL( f{s now ) = t , s)£(sense-f(x, t )). 

So we see that the belief that the value of / is t after doing a sensing action is the product 
of the belief that the value of / was t before doing the sensing action and the conditional 
probability of reading the value x given that the true value is t, divided by a normalizing 
factor. Furthermore, if we examine the denominator, we see that it can be interpreted as 
the agent's prior belief that it would read the value x from its sensor. Again, our calculation 
is identical to the standard Bayesian calculation. 



7.3.2 Noisy effectors 

Noisy effectors modify the agent's beliefs in a similar manner. The key difference here is 
that these actions also change the world, so the agent's updated beliefs must reflect those 
changes. 

Suppose that we have an action change-f(x, y) that can be used to modify the current 
value of the fluent /. Let its precondition be 

Poss(change-f(x, y),s) = TRUE 

(indicating that the action is always executable^]), its likelihood function be 

£(change-f(x, y), s) = £(change-f(x, y)) 

11 This is similar to (^), the precondition to advance, except that we are not asserting any absolute error 
bound. 
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(indicating that the likelihood is not dependent on the situation), and its corresponding 
nondeterministic action (denoting that the effector was actuated with the nominal value x) 
be 

noisy-change-f(rr) = iry. change-f(:r, y). 

Furthermore, suppose that the only effect of change-f is to modify the fluent / (and 
suppose that it is the only action that does so). This is reflected in /'s successor-state 
axiom, which is 

Poss(a, s) D f(do(a,s)) = 
if 3x, y.a — change-f(x, y) 
then f(s) + y 
else f(s). 

Suppose that the agent attempts to increment the fluent / by executing the action 
noisy-change-f(x). From the precondition we can see that there are many different situa- 
tions s + such that £>o(noisy-change-f(x), s, s + ) (unlike the case above for noisy sensors, where 
there was a unique successor situation). Each of these situations is the result of execut- 
ing change-f(x, y), for some y, in situation s. That change-f changes the world is captured 
by the fact that for each of these successor situations s + = do(change-f(x, y), s), we have 
f(s+) = f(s) + y. 

As in our derivation for noisy sensors, we have 

Do(noisy-change-f(x), s, s + ) D 

Bel(/(s„ ow ) = t,s + ) 

= J2 {s ' + :f( s ' + )=t} P(S' + , S + )/ J2s'+ P( S>+ , S + ) 

— Ey J2{ s ' + :f(s' + )=tAs' + = do(chnngc-l(x,y),s')} P( S ' + J S+ ) / E s ' + P( S ' + ) S + ) 

= Ey E{ s ':f(s>)=t-y} p(s', s)^(change-f (x, y), s') / J2 y E s ' p(s', s)£(change-f(a;, y), s') 
= E y ^(change-f (x, y)) E{ s >:f(s>)=t- y } p(s', s) / E y ^(change-f (x, y)) E s > pW, s). 

Now dividing top and bottom by J2' s p( s \ s ) to convert to beliefs, and simplifying, using the 
observation that J2 y ^(change-f (re, y)) is 1, we obtain 

Z?o(noisy-change-f(x), s, s + ) D , , 

BEL{f{s now ) = t, s+) = Y, y ^(change-f(x, y))BEh{f{s now ) =t-y,s). ( ' 

There are two differences between this result and our previous result for noisy sensors. 
First, we have a summation over y, the possible values that could have been generated by 
the action. This summation arises from the fact that any situation prior to the action could 
have given rise to a situation in which f — t: all that is required is that an action with the 
appropriate effect on / be executed. In the noisy-sensor case, the actual value of the fluent 
is not changed, so we need to consider only the prior situations in which the fluent has the 
same value. Second, it seems that there is no normalizing constant in this case. However, the 
normalizing constant is present and equal to Ef J2 y ^(change-f(x, y))~BEL(f(s now ) —t' — y, s). 
It turns out that this sum is always equal to 1. 

There is another way of deriving this result. Consider the agent's beliefs about the 
value of the fluent / prior to executing change-f. The fluent / can be viewed as being 
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a random variable, with the probability that it takes on any particular value t equal to 
Bel(/(s„ ow ) = t, s). When the action noisy-change-f(x) is executed, the value y that it adds 
to / is also a random variable with distribution £(change-f(x, yj). Hence, the new value 
of / after executing noisy-change-f(x) is the sum of two independent random variables, the 
old value of / and the amount added y. A standard result from probability theory is that 
the distribution of the sum of two independent random variables is the convolution of their 
distributions. Our result above shows that the agent's new beliefs about / are in fact the 
convolution of its prior beliefs and the distribution on the amount by which / can change. 

7.4 Examples 

It is convenient to be able to talk about what happens to the agent's beliefs and knowledge 
after reading some particular sequence of values from its sensors. To do this, we introduce 
a parameterized version of noisy-sense-position: 

noisy-sense-position (x) = rcy. sense-position [x, y), (19) 

where only the value y is chosen (by the value of position in the situation). Intuitively, 
noisy-sense-position (x) is the "action" where the agent activates its sensor and then observes 
that the value x has been returned. This action is not actually executable by the agent, 
since it cannot choose to execute a noisy-sense-position(3.0) over, say, a noisy-sense-position(2.8). 
Nevertheless, we can write Z)o(noisy-sense-position(3.0), s, s + ), which asserts that the agent has 
executed a noisy-sense-position and read the value 3.0 from its sensor, ending up in s + . 

Example 7.1: Suppose that the agent is sensing its position position using noisy-sense-position 
actions. For the purposes of this example, let the action-likelihood axiom for sense-position 

be 

£(sense-position(a;, y),s) = if x — y 

then 0.5 
else if \x — y\ — 1 

then 0.25 
else 

Here we are assuming that position and the arguments to sense-position can take on only 
integer values (i.e., this is the precision of these numbers). The axiom specifies that there is 
zero probability that the sensor will read a position that is greater than 1 unit away from 
the true position, and that the sensor noise is independent of other features of the situation. 

Let the agent's initial beliefs regarding position be given by BEL(position(s now ) = t, So) = 
1/8, for t G {8,9,11,12}, and 3EL(position(s now ) = 10, So) = 1/2. Initially, the agent does 
not ascribe positive probability to any other possible value for position. This distribution of 
beliefs for the various values of position in So is shown in Figure [TJ. 

Suppose that the agent senses its position and observes the value 11. If 5*1 is a pos- 
sible successor situation, so that Do(noisy-sense-position(ll), So, Si) holds, a simple calcula- 
tion using (]T?D shows how the agent's beliefs change: BEh(position(s now ) = 10, Si) = 4/7, 
Bel (position(s now ) = 11, Si) = 2/7, and BEL(position(s now ) = 12, Si) = 1/7. The new 
distribution is also shown in Figure [I]. Since, with probability 1, the sensor returns a value 
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Figure 1: Sensing with Exact Motion 



that is within 1 unit of the true value, the agent now has degree of belief zero in the values 
8 and 9. 

Note that Figure |I] shows that the agent still that the most likely value of position is 10, 
even though its sensor returned the value 11. This arises from the agent's high prior belief 
in the value being 10. 

We have not specified an absolute error bound for sense-position, i.e., a value for c in (fl2|). 
implicitly assuming it to be greater than 1. If, for example, we were to specify that the agent 
knows that the sensor cannot return an error of greater than 2, i.e., if we set c = 2 in the 
precondition axiom, then we will have that the agent knows that position ^ 8. However, the 
agent will not know that position ^ 9, even though it still has zero degree of belief in this 
value. Our framework distinguishes between full belief and knowledge in this case. I 

Sequences of sensor readings of the same fluent, including sequences of readings from 
different sensors, are also handled correctly in our framework. Such sequences correspond to 
sequences of sensing actions, and thus are handled by a simple iteration of ([H]), given that the 
sensors satisfy the assumptions of that equation. In particular, if the sensor action likelihood 
function is dependent only on the actual value being sensed, then each sensor reading will be 
independent of all previous readings.0 As a result, after a sequence of sensing actions, the 
agent will come to have either greater or less certainty about the value of the sensed fluent, 
depending on whether or not the sequence of readings agree or not. 

Example 7.2 : Suppose that the agent executes another noisy-sense-position action in 

12 A situation is a world history consisting of the sequence of actions executed (Section ||). Hence, one can 
easily write a sensor-likelihood axiom that did display a dependency on previous readings, if, for example, 
one wanted to capture the properties of a sensor that displayed some form of hysteresis. 
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the situation Si. Further, suppose that the agent observes the same value as before 11, 
and let S2 be such that Do(noisy-sense-position(ll), Si, S2). Then, another application of 
(|I~7j) (applied to the agent's beliefs in Si), yields the belief distribution shown in Fig- 
ure [l]: BEL(position(s now ) = 10, S2) = 4/9, ~BEh(position(s now ) = 11,5*2) = 4/9, and 
Bel (position (s now ) = 12, 5*2) = 1/9. The agent's beliefs have converged more tightly around 
the value 11, since it has now sensed that value twice. | 

The most important part of our formalism is the natural manner in which effectors and 
sensors interact in their effects on the agent's beliefs. For simplicity, first we examine the case 
where the agent has exact control over a fluent, but can sense its value only approximately. 
Then we examine the case of a noisy sensor interleaved with a noisy effector. 

Example 7.3: Suppose that instead of the noisy effector noisy-advance, the agent has an ex- 
act effector action exact-advance that has no preconditions, and affects position as specified in 
Note that exact-advance is an ordinary deterministic action, so it is observationally indis- 
tinguishable only from itself, and it has a unit likelihood function: £(exact-advance(x), s) = 1. 

Let the agent move exactly 2 units backwards when it is in situation S^. Call the new 
situation S3 = G?o(exact-advance(— 2), 5*2). Then, the successor-state axiom for p and position 
imply that the agent's beliefs are precisely shifted to worlds in which it has moved backward 
2 units: we have a simple transfer of probability mass from each situation to its successor 
situation in which the agent has moved backward 2 units. 

Deterministic actions like this modify the agent's beliefs in a manner that is related to 
Lewis's notion of imaging [|n|. In imaging, beliefs are updated by transferring all probability 
mass to the "closest" world, rather than by renormalizing the mass after removing some 
worlds as when we condition. Here, every situation transfers its probability mass to its 
successor. Different things are true in these successor situations, as the action has effected 
various changes. This means that the agent will believe different things in the successor 
situation. Nevertheless, since all of these successor situations arise from the execution of the 
same action, the changes to the agent's beliefs are generally systematic. In particular, in the 
absence of ramifications, it will change its beliefs only about fluents affected by the action. 

In this example, we have that BEL(position(s now ) = t, S3) = B~EL(position(s now ) = 
t + 2, S2) for all t. The agent's shifted beliefs are shown in Figure |]. This is exactly how one 
would expect the agent's beliefs to change in response to an exact movement like this. | 

Example 7.4: Suppose that the agent again executes a noisy-sense-position action in S3 and 
observes the value 9. Let S 4 be such that Z)o(noisy-sense-position(9), 5*3, S4). A reading of 9 is 
consistent with the previous readings of 11, since the agent has moved back 2 units. Hence, 
as shown in Figure [I], it results in a further tightening of the agent's beliefs, around the value 
9. If the agent subsequently moves forward 2 units, executing an exact-advance(2) action, so 
that S5 = c?o(exact-advance(2), 5*4), its beliefs will then be clustered around 11, as shown in 
the figure: ~BEh(position(s now ) = 10, S 5 ) = 4/13, BEL(position(s now ) = 11, £5) = 8/13), and 
BEL(position(s now ) = 12, S 5 ) = 1/13. 

Intuitively, since the agent's exact-advance action incurs no error, we would expect that 
if the agent had sensed the value 11 in situation S2, then its beliefs about the distance to 
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the wall should not change after moving forwards and backwards an equal distance. Our 
model respects this intuition. In particular, the agent's beliefs in 5 5 are identical to what 
they would be if it had executed an noisy-sense-position(ll) in S2, as indicated in the figure 
by the diagonal arrow from S2 to S5. I 

Example 7.5: Now we examine the case of a noisy effector. Let us return to the effector 
noisy-advance, taking (|7|) to specify the successor-state axioms for position. Suppose that the 
likelihood axiom for advance is 

£(advance(x, y), s) = if x = y 

then 0.5 
else-if \x — y\ = 1 

then 0.25 
else 

Here, with probability 1, the difference between the agent's actual move and the move 
specified is no more than one unit. 

Starting at situation 5*5, with beliefs about position as given above, suppose that Sq is 
a successor state of an attempt to move forward 1 unit, so that £>o(noisy-advance(l), S$, Sq) 
holds. The likelihood axiom above indicates that in this case the probability the agent moves 
forward 2 units is 1/4, 1 unit is 1/2, and units is 1/4. Equation |l8|can be used to calculate 
the agent's new beliefs about its position. In particular, we obtain BEh(position(s now ) = 
10, 5 6 ) = 4/52, BEL(position(s now ) = 11, S e ) = 16/52, BEL(position(s now ) = 12, 5 6 ) = 
21/52, BEL(position(s now ) = 13, Sq) = 10/52, and BEL(position(s now ) = 14, Sq) = 1/52. 
This distribution is shown in Figure 0. 

Suppose that the agent now attempts to move back to its previous position by executing 
a noisy-advance(— 1). If situation S 7 is such that Do(noisy-advance(— 1), 5*6, 57) holds, then 
the agent's beliefs in S 7 are BEL(position(s now ) = 8, S 7 ) = 1/52, BEL(position(s now ) = 
9,5*7) = 6/52, BEh(position(s now ) = 10, 5 7 ) = 14.25/52, BEh(position(s now ) = 11, S 7 ) = 
17/52, BEh(position(s now ) = 12, S 7 ) = 10.5/52, BEL(position(s now ) = 13,57) = 3/52, and 
Bel (position (s„ ow ) = 14, S 7 ) = 0.25/52. 

We see from this example that in the presence of noisy motion, the agent's confidence 
in its location degrades; its beliefs spread out. And this effect increases the more it moves. 
Unless it wants to lose its way completely, it must resort to sensing. Figure |2| shows what 
happens to the agent's beliefs after it senses its position to be 11 in situation £7. If S$ 
is such that Do(noisy-sense-position(ll), S 7 , S$) holds, then the agent's new beliefs in S$ are 
BEL(position(s now ) = 8,S$) = 0, BEL(position(s now ) = 9,Sg) = 0, BEL(position(s now ) = 
10, Sg) = 57/235, BEL(position(s now ) = 11, Ss) = 136/235, BEh(position(s now ) = 12, S$) = 
42/235, BEL(position(s now ) = 13,5*8) = 0, and BEL(position(s now ) = 14,5s) = 0. Thus, the 
beliefs are considerably sharpened by the sensing operation. | 

The previous examples have employed discrete beliefs and likelihood functions, which 
make the computations more apparent. More practical models are also easily accommodated, 
and in fact sometimes yield simpler, albeit less obvious, computations. 
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Figure 2: Noisy Motion 



In robotics applications one of the most popular models of noisy sensors and effectors 
is the linear Gaussian model where the nominal value is the actual value plus a Gaussian 
noise factor (see, e.g., the various papers in [|J). Equations [L^ and [L6| are examples of linear 
Gaussian models. The next example shows how easy belief update is with such models. 

Example 7.6: Suppose that the agent has an action noisy-change-f for effecting changes to 



a fluent /, and an action noisy-sense-f for sensing its value, as in Sections [7.3. 1| and |7.3.2| . Fur- 
ther, suppose that the likelihood functions for both change-f and sense-f are linear Gaussian, 
so that 

£(change-f(x, y), s) = Normally — x)/a e ), 

and 

£ (sense- f (x, y), s) = Normally — x)/a s ). 

Suppose that the agent's initial beliefs about a fluent / are described by the normal 
distribution A^(// ,Cq) with mean \i and variance a\. That is, BEL(f(s now ) = t,S ) = 
Normal((t — /io)/o"o), where Normal(z) is the standardized normal distribution with mean 
and variance 1. 

Now suppose that a noisy-change-f(a;i) action is executed. Equation [IB] shows that the 
agent's new beliefs about / is that its value is the result of adding two independent random 
variables (which gives rise to the convolution of two distributions). In particular, the new 
value for the fluent / is the sum of the change actually generated by the action and /'s 
previous value. The likelihood axiom for change-f (xi, indicates that y, the change to / 
generated by the action, is a normally distributed random variable with mean x% and variance 
of. The agent's initial beliefs indicate that /'s previous value is also a normally distributed 
random variable with mean /i and variance <7q. 
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A well known (and easy-to-derive) result is that the sum of two normally distributed 



random variables is also normally distributed (e.g., see |18|, page 146]). In particular, we 
have that, after executing the action, the agent's new beliefs about / are described by 
another normal distribution with mean /ii = /i + x\ and variance o\ = a\ + <j\. 

Next, suppose that the agent executes a sensing action and reads the value z\ from its 
sensor, i.e., the action noisy-sense-f^) is executed. Equation 17 indicates that the agent's new 
beliefs about the value of / are the normalized result of multiplying its prior beliefs by the 
probability of observing Z\ given the value of /. Both of these quantities are again described 
by normal distributions. The prior beliefs are normally distributed, and for each value of /, 
the probability of Z\ is also normally distributed, according to the likelihood function above. 
In this case, another well-known result is that the posterior beliefs about / remain normally 
distributed (see, e.g., [|19|, page 2]). In particular, we have that the agent's new beliefs about 
/ are that its value is normally distributed with mean /i 2 = {z\o\ + /iiof) / \a 2 s + erf) and 
variance a\ = crfo%/ \a\ + of). 

It is easy to see that these equations can be applied iteratively to keep track of the agent's 
beliefs about a fluent's value given a sequence of modification to and sensing of this value. 
With a normal distribution we need keep track only of the current mean and variance to 
completely describe the agent's beliefs. 

The fact that a normal distribution is preserved under these types of updates, with 
easy-to-compute modifications to its parameters, forms the basis of the popular technique 
of Kalman filtering [H. In fact, this example is simply an instance of Kalman filtering, and 
it can be generalized within our framework to handle the situation where the fluent being 
modified and sensed is vector valued. In this case the above variances would be replaced by 
covariance matrices, and the computations to update the mean and variance would become 
matrix manipulations. I 



All of the previous examples involved the sensing or affecting of a numeric fluent. Our 
final example demonstrates that the formalism can also be used to model noisy actions that 
affect non-numeric fluents. 

Example 7.7: In Section § we used the example of dropping fragile objects. However, 
when a fragile object is dropped it does not always break, it breaks with some probability. 
We an model this situation quite easily in our formalism. 

Suppose that there are two primitive drop actions, drop-break and drop-not-break, charac- 
terized by the following axioms: 

Poss(drop-break(x), s) = Holding(x, s) A Fragile(x, s) 

Poss(drop-not-break(x), s) = Holding(x, s) 
Ol(drop-break(x), a', s) = a = drop-break(x) V a = drop-not-break(x) 
Ol(drop-not-break(x), a', s) = a — drop-not-break(x) V a' = drop-break(a;) 
£(drop-break(x), s) = if Fragile(x) then 0.8 else 
^(drop-not-break(x), s) = if Fragile(x) then 0.2 else 1. 
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Let the successor-state axiom for Broken be 

Poss(a, s) D Broken(x, do(a, s)) = a — drop-break(x) V Broken(x, s). 
Finally, suppose that the agent can execute a nondeterministic action drop: 

drop(x) = f drop-break(x)|drop-not-break.(x). 

These axioms specify that the actions drop-break and drop-not-break cannot be distin- 
guished by the agent. That is, the agent can execute a drop action, but does not know ahead 
of time whether or not dropping an object will cause it to be broken. If the object is not 
fragile only the drop-not-break action can be executed, in which case it has probability 1 of 
being executed. On the other hand, if the object is fragile, then the preconditions of both 
drop-not-break and drop-break are satisfied and either can be executed. The probability of 
drop-break being executed, i.e., the probability that a fragile object will break when dropped, 
is 0.8, while drop-not-break has probability 0.2. The successor-state axiom for break specifies 
that drop-break does in fact cause an object to become broken. 

If the agent executes drop(x) in situation s for some object x given that KNOW {holding (x), s) 
(so that the action is possible), then we obtain for any successor situation s + 

Do(drop(A), s, s + ) A K.NOW(holding(x) , s) 

D BEL(Broken(x, s now ), s + ) = 0.8 x BEL(Fragile(x , s now ), s). 

For example, if the agent's degree of belief is 0.5 that the object being dropped is fragile, 
then the agent will have degree of belief degree 0.4 that the object will be broken after being 
dropped. Similarly, if the agent is certain that the object is fragile, then it will have degree 
of belief 0.8 that the object will break after being dropped. This makes sense intuitively, as 
the object can be broken only by dropping if it was originally fragile, and even then there is 
a probability of only 0.8 that it will be broken by the drop action. | 



8 Conclusion 

We have demonstrated that noisy perception and actions can be modeled in the situation 
calculus by a simple extension of previous work. In particular, from the successor-state 
axiom for p, @, and a constraint on its values in So, (|14]), we obtain as consequences what 
many have argued to be the natural models for belief update from perception (Bayesian 
conditioning) and from actions (a form of Lewis's imaging) |23|. Most importantly, our 
formalism succeeds in capturing some key features of the interaction between these two 
models for belief change. 

Much of our approach can be exported to alternate formalisms. For example, instead of 
the situation calculus a modal logic could have been used. Similarly, the probabilistic com- 
ponent could be replaced with an alternate formalism, like Dempster- Shafer belief functions 
|30| or possibility measures f|. All that would be required is to replace the functional fluent 



p and axioms for Bel with fluents and axioms to support an alternate measure of belief. 
The likelihood functions could then be replaced with non-probabilistic functions to support 
an alternate rule of belief update. 
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As for future work, apart from addressing limitations of the formalism, there is its ap- 
plication to high-level agent control. In the GOLOG work mentioned in the introduction, the 
ability of an agent to execute a program depends on what it knows about the truth value 
of the test conditions in that program [|l^]. When an agent has only a degree of belief in 
the truth of a test condition in a program it can randomize the program's execution. That 
is, instead of executing a set of actions only when a test condition is true, it can randomly 
decide to execute the actions with probability p when its degree of belief in the test condition 
is p. The key issue will then be characterizing the program's effects. For example, one could 
construct programs that with probability bounded below by 1 — e achieve a certain condition. 
Or one could employ a richer model that included an assignment of utilities to situations, 
and attempt to construct programs that maximize the agent's expected utility. 

A Formalization of Belief 

In this appendix, we show how Bel can be formalized so as to ensure that it is in fact a 
probability distribution. 

As mentioned in Section we treat the situation calculus as a many-sorted dialect of the 
predicate calculus with some second-order features. Included among the sorts are situations 
and real numbers. For convenience, we also assume that the language includes the natural 
numbers as a sub-sort of the reals. In particular, in the language we have variables that 
range only over these specific sorts.0 

With sorted variables we can then formalize the summations used in our definition of 
Bel (Defn. H). Let 4>{s now ) be a formula over the special situation term s n0W) r and r' be 
variables of sort real, / and g be second-order function variables (ranging over all functions), 
and m, i, and j be variables of sort natural number. We define sum p (cj)(s now ), s) to be the 
summation of p in s over all situations s' satisfying 4>{s now / s'): 

sum p ((f)(s now ), s)=r = 
Vr'.(r' < r) = 

Vi,j.(z / j'D g{i) ^ g(j)) A (i < m D (j)(s now /g(i))) 
A/(0) = 

AVi.f(i + l)=f(i)+p(g(i),s) 
A f(m) > r' 

Basically, this formula says that sum p (0(s now ), s) is equal to r iff for every value r', r' is 
less than r iff there exists a finite set m of situations (enumerated by the function g(0), . . . , 
g(m)) satisfying <fi whose p values sum to a value (computed by the function /) greater than 
r' . Note that this definition entails that an infinite set of situations has a p sum that is the 

13 As is well known, with 0, 1, and + in the language, we can actually define the natural numbers, since we 
have second-order quantification, as follows: n G N is an abbrevation for VP((0 6 PA Vy(y G P D y + 1 £ 
P)) D n e P). 

14 Note that we can easily do without such variables by including some unary "type" predicates in our 
language. 
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limit of the sum of p over its elements. In particular, the resulting probability distribution 
(achieved once we normalize the sum) will be a discrete distribution. 
Now we can formally define Bel(0(s„ ow ), s) as 

BEh(4>(s now ), s) = sum p (0(s not „),s)/sump(TRUE, s), 

where true is satisfied by all situations. 

In Section |5| we imposed the following constraints on p: 

Vs.(p(a, S ) > 0) A (-.#(*, Sq) D p(s, So) = 0). 

Now we can add the additional constraint: 

Ir.r > A sum p (TRUE, So) = r, 

where r is a variable of sort real. This constraint ensures that the sum of p over all situ- 
ations is, in 5*o, a finite positive number. From these constraints it follows, that for every 
formula 0, sum p (0(s„ ow ), So) exists and, since p is non-negative, is less than (or equal to) 
sum p (<f)(s now ), So). It is then immediate that in So, Bel is a probability distribution over 
the space of situations. Finally, the constraint ->K(s, So) D p(s, So) = entails that all that 
is known is assigned degree of belief 1. 

What about Bel in situations other than So? By identical reasoning, in any situation 
s, Bel will be a probability distribution over the set of situations if the sum of p over all 
situations is, in s, a finite positive number and p is non-negative. To guarantee this we 
require that the action-likelihood functions (Defn. [?]) be probability distributions over the 
set of actions. 

First we define sum^(0(a now , s), s) to be the sum of £(a now , s) over all actions a now that 
satisfy the formula <f)(a now ,s) in situation s. This can be defined in an identical manner 
to sum p , the only change required is to sum the values of i(g(i), s) instead of the values of 
p(g(i), s) in the formula above. We then impose the following constraints: 

1. Va,sl(a,s) > 0. 

2. Va, s.sum^(Oi(a, a now , s), s) = 1. 

The first formula says the action-likelihood function is non-negative, while the second formula 
ensures that for every action a the sum of the likelihood function over all actions a now 
that are observationally indistinguishable to a in situation s is one. Together these two 
formulas make the action-likelihood function a discrete probability distribution over each set 
of observationally indistinguishable actions. 

With this in hand, the successor-state axiom for p (Defn. |^) gives us what we require. 
Given that suhi p (true, s) is finite the axiom shows that for any action a, sum p (TRUE, do(a, s)) 
remains finite since in the term p(s', s) x £(a', s'), £(a', s') has a finite sum over the a'. Since 
sum p (TRUE, So) is finite, we obtain by induction that it remains finite for every situation 
(since all situations arise from applying a finite sequence of actions to So). Finally, the 
successor-state axiom for K (Defn. |j) shows that only i^-related situations get positive val- 
ues of p. Hence in every situation, not just S , all that is known is assigned degree of belief 
1. 
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